Skip to content

Modify benchmark scripts to look for species_database.yml files in Ref and Dev rundirs#389

Merged
yantosca merged 27 commits intodevfrom
bugfix/ref-and-dev-spcdb
Jan 21, 2026
Merged

Modify benchmark scripts to look for species_database.yml files in Ref and Dev rundirs#389
yantosca merged 27 commits intodevfrom
bugfix/ref-and-dev-spcdb

Conversation

@yantosca
Copy link
Contributor

@yantosca yantosca commented Jan 5, 2026

Name and Institution (Required)

Name: Bob Yantosca
Institution: Harvard + GCST

Describe the update

In this PR, we have done the following:

  1. Removed the paths:spcdb_dir YAML tag in *benchmark.yml config files.

  2. Added function get_species_database_files in gcpy/benchmark/modules/benchmark_utils.py, which returns the absolute paths to species_database.yml files in Ref and Dev.

  3. Added function read_species_metadata to gcpy/util.py, which accepts either a single file path to species_database.yml or a list of file paths to the species_database.yml files in the Ref & Dev rundirs) and returns the Ref and Dev species database dicts. If only one file path is passed, the same species database will be returned for both Ref and Dev.

  4. Replaced the spcdb_dir keyword argument in several functions with spcdb_files, which can be of type str or list.

  5. Modified gcpy/plot/compare_single_level.py and gcpy/plot/compare_zonal_mean.py to accept spcdb_files instead of spcdb_dir. Added corresponding logic so that the Ref species database is used with Ref data and the Dev species database is used with Ref data.

  6. Fixed several issues:

    • Renamed spcdb to metadata make_benchmark_aerosol_tables
    • Added parentheses in routine face_area (in cstools.py) to force correct operator order
    • Updated make_benchmark_aerosol_tables to include all dust species in the aerosol burdens table
    • Updated make_benchmark_aerosol_tables to remove hardwiring in the computation of global AOD
    • Restored missing YAML tags in 1yr_fullchem_benchmark.yml configuration file
    • Added structural updates suggested by Pylint
    • Fixed error in the determination of which variables are "Ref only" and "Dev only" in routine create_benchmark_emission_tables

Expected changes

These updates should allow you to create benchmark plots in the case where Dev contains some species which Dev does not.

gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py
- Now import ENCODING from gcpy.constants and use that instead of
  the hardwired "utf-8" in open statements
- Added routine "check_file_for_timing_info", which tests if a text
  file has GCHP timing info.  If not it returns the file path to
  "allPEs.log".  This is necessary due to an update in behavior
  in MAPL 2.59.
- In routine "read_one_text_file"
  - Now call "check_file_for_timing_info"before reading the file.
  - Strip out MAPL text that is usually placed into allPEs.log.
  - Skip all lines between after the GCHPchem section until
    the Summary section.
  - Change the marker for the Summary section to "Report on process:"
    (i.e. without a number).
  - Changed the break-out-of-loop command for cloud benchmark log files
    to "++", which occurs after the timers section.  This is only for
    MAPL versions prior to 2.59.
This merge brings PR #388 (Update "benchmark_scrape_gchp_timers.py"
to look for timing information in "allPEs.log" if not found in the
GCHP log file, by @yantosca) into the GCPy 1.7.0 development stream.

PR #388 updates the "benchmark_scrape_gchp_timers.py" script to check
if the GCHP timers output is present in the log file.  If not, it will
read it from the allPEs.log file in the run directory.  This is needed
because GCHP simulations with MAPL v2.59+ now send timers output to
the allPEs.log file.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_utils.py
- Import verify_variable_type from gcpy.util
- Added function get_species_database_files, which reads the config
  object and returns paths to the species_database.yml files
  in Ref and Dev rundirs

gcpy/util.py
- Added function read_species_metadata, which returns the species
  metadata for the union of species in multiple species_database.yml
  files.
- Added code updates suggested by Pylint
- Updated comments

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/cloud/*.yml
gcpy/benchmark/config/*.yml
- Removed "paths:spcdb_dir" YAML tag
- Added "ref:gcc:species_metadata", "ref:gchp:species_metadata",
  "dev"gcc:species_metadata", and "dev:gchp:species_metadata" YAML
  tags to specify separate species_database.yml files for Ref and
  Dev rundirs

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/plot/compare_single_level.py
gcpy/plot/compare_zonal_mean.py
- Import "read_species_metadata" from gcpy.util
- Add "spcdb_files=None" as a keyword argument
- Throw an error if spcdb_files is None while convert_to_ugm3=True


Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/oh_metrics.py
- Import constants from gcpy_constants by name
- Import functions from gcpy.util by name
- Updated PyDoc headers
- Removed special handling for xr.open_mfdataset
- Renamed "ds" to "data"
- Implement suggestions from Pylint:
  - Snake case naming for variables

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/budget_ox.py
gcpy/benchmark/modules/budget_tt.py
- Import constants from gcpy.constants by name
- Pass spcdb_file as an argument; remove spcdb_dir
- Updated PyDoc headers
- Implemented suggestions from Pylint:
  - Snake case/lower case for variables
  - Do not use a list as a default value for keyword arguments
  - Specify an encoding in open() statements
  - Use f-strings instead of ".format" in print statements

gcpy/benchmark/ste_flux.py
- Removed "import os"
- Import constants from gcpy.constatns by name
- Removed special handling for xr.open_mfdataset
- Implemented suggestions from Pylint:
  - Snake case/lower case for variables
  - Do not use a list as a default value for keyword arguments
  - Specify an encoding in open() statements

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_funcs.py
- Pass spcdb_files as a positional argument
- Removed spcdb_dir keyword argument
- Import functions by name from gcpy.util
- Updated PyDoc headers
- Verify argument types with verify_variable_type()
- Use "read_species_metadata" function to read the species database
  files from Ref & Dev and return the union of species
- Cosmetic changes (indentation, line breaks, comments)
- Pass spcdb_files as a keyword argument to compute_single_level
- Pass spcdb_files as a keyword argument to compute_zonal_mean
- Implemented suggestions from Pylint:
  - Use snake case/lower case for variables
  - Specify encoding when opening text files for output

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_drydep.py
- Add spcdb_files as a positional argument
- Remove spcdb_dir keyword argument
- Updated PyDoc headers
- Pass spcdb_files as a keyword argument to make_benchmark_drydep_plots

gcpy/benchmark/modules/benchmark_mass_cons_table.py
- Use "read_species_metadata" to read species database files from Ref
  & Dev, and return the union of all species
- Pass spcdb_files as an argument

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py
- Added "check=False" as an argument to subprocess.run,
  as suggested by Pylint.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_species_changes.py
- Pass spcdb_files as a keyword argument
- Use "read_species_metadata" to read the species_database.yml files
  for Ref & Dev, and return the union of all species
- Updated PyDoc headers

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/run_1yr_fullchem_benchmark.py
gcpy/modules/run_1yr_tt_benchmark.py
gcpy/benchmark/run_benchmark.py
- Remove "import get_species_database_dir" from benchmark_funcs.py
- Added "import get_species_database_files" from benchmark_utils.py
- Call get_species_database_files at the beginning of GCC vs GCC,
  GCHP vs GCC, GCHP vs GCHP and Diff of Diffs to get species_database.yml
  files from Ref & Dev folders.  Store in spcdb_files.
- Pass spcdb_files as an argument to benchmarking routines
- Remove spcdb_dir keyword argument from calls to benchmarking routines
- Implement fixes and suggestions from Pylint

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_funcs.py
- In make_benchmark_conc_plots and make_benchmark_emis_plots,
  test if ref and dev are of types (str, list)

gcpy/benchmark/modules/benchmark_utils.py
- Added missing print statement for the path to the species_database.yml
  in the Ref run directory

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca added this to the 1.7.0 milestone Jan 5, 2026
@yantosca yantosca self-assigned this Jan 5, 2026
@yantosca yantosca added topic: Benchmark Plots and Tables Issues pertaining to generating plots/tables from benchmark output category: Bug Fix Fixes a bug that was previously reported labels Jan 5, 2026
gcpy/benchmark/modules/ste_flux.py
- Fixed typo "is_TransportTracers" -> "is_transport_tracers"
- Cosmetic changes (removed line break in import statement

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/run_1yr_tt_benchmark.py
- Bug fix: Make sure "spcdb_files" is the 5th argument (following
  dev_label) in the call to make_benchmark_mass_conservation_table
  for GCC vs GCC.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_categories.yml
- Removed the st_Ox species, this is no longer included in
  TransportTracers simulations

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/config/1yr_fullchem_benchmark.yml
- Restored the following YAML tags, which were inadvertently
  omitted from a previous commit
  - "plot_models_vs_obs: True
  - "aer_budget_table: True"
  - "Ox_budget_table: True"

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca force-pushed the bugfix/ref-and-dev-spcdb branch from 3d5b94d to 56f1893 Compare January 8, 2026 17:47
gcpy/benchmark/modules/benchmark_funcs.py
- In routine "make_benchmark_aerosol_tables", we have renamed the
  "spcdb" variable to "properties", as this is the dict that stores
  the species database metadata.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/cstools.py
- Changed x_lat = i_lat + i_vert in (0, 3)
  to      x_lat = i_lat + (i_vert in (0, 3))
  in order to enforce proper evaluation order.

This fixes an error that was introduced in commit 2b0c6ad, when we
added some improvements suggested by Pylint.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/benchmark_funcs.py
- Now include old and new dust species names in species_list
- Update logic so that all dust species are included in the
  aerosol burdens table
- Skip dust species (DST* or DSTbin*) that are not found in the data
  files
- Renamed spc2name to full_names, and use the species database to
  get the long name for each species
- Updated the logic so that DST1 is not hardwired as a species name
  in the AOD table section
- Added internal routine print_aods to print the global AOD table
  separately from routine print_aerosol_metrics

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca requested a review from lizziel January 9, 2026 19:42
@yantosca yantosca marked this pull request as ready for review January 9, 2026 19:43
@yantosca
Copy link
Contributor Author

yantosca commented Jan 9, 2026

Samples of the output:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Annual average global aerosol burdens for 2019 in gcc.14.7.0-rc.0
 (weighted by the number of days per month)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

                                                                   Strat         Trop         Strat+Trop
                                                                   -----------   ----------   ----------
Hydrophilic black carbon aerosol          (BCPI   ) burden [Tg] :  0.002118068   0.09570427   0.09782234

Hydrophilic organic carbon aerosol        (OCPI   ) burden [Tg] :  0.007041742   0.41990561   0.42694735

Sulfate                                   (SO4    ) burden [Tg] :  0.359306033   1.39562352   1.75492956

Dust aerosol, Reff = 0.151 microns        (DSTbin1) burden [Tg] :  0.000355288   0.03622999   0.03658528

Dust aerosol, Reff = 0.253 microns        (DSTbin2) burden [Tg] :  0.001523766   0.16645760   0.16798137

Dust aerosol, Reff = 0.402 microns        (DSTbin3) burden [Tg] :  0.009651065   1.17880181   1.18845288

Dust aerosol, Reff = 0.818 microns        (DSTbin4) burden [Tg] :  0.017383738   2.88607972   2.90346345

Dust aerosol, Reff = 1.491 microns        (DSTbin5) burden [Tg] :  0.009823191   5.54217088   5.55199407

Dust aerosol, Reff = 2.417 microns        (DSTbin6) burden [Tg] :  0.002355801   6.25728160   6.25963740

Dust aerosol, Reff = 3.721 microns        (DSTbin7) burden [Tg] :  0.000706825   7.97738362   7.97809044

Fine (0.01-0.05 microns) sea salt aerosol (SALA   ) burden [Tg] :  0.000915225   0.32516280   0.32607802

Coarse (0.5-8 microns) sea salt aerosol   (SALC   ) burden [Tg] :  0.000464604   2.69733069   2.69779530

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Annual average global AODs for 2019 in gcc.14.7.0-rc.0
 (weighted by the number of days per month)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

                                Strat         Trop         Strat+Trop
                                -----------   ----------   ----------
Dust column optical depth [1]:  0.000099603   0.02248508   0.02258468

BCPI column optical depth [1]:  0.000027332   0.00146519   0.00149252

OCPI column optical depth [1]:  0.000087401   0.01970445   0.01979185

SALA column optical depth [1]:  0.000018285   0.01478862   0.01480691

SALC column optical depth [1]:  0.000002156   0.02047868   0.02048084

SO4  column optical depth [1]:  0.000330299   0.03358278   0.03391308

gcpy/benchmark/modules/benchmark_funcs.py
- Obtain separate species metadata from Ref & Dev for use in
  in unit conversions (MW_g is the most relevant)
- Removed some superfluous error checks
- In routine make_benchark_operations_budget, use metadata from
  from both Ref & Dev to determine if it is a wet depositing species
- Updated comments

gcpy/benchmark/modules/benchmark_utils.py
- Added constant SPECIES_DATABASE
- Use SPECIES_DATABASE constant when constructing file paths in
  function get_species_database_files

gcpy/benchmark/modules/oh_metrics.py
gcpy/benchmark/modules/budget_tt.py
gcpy/benchmark/modules/benchmark_mass_cons_table.py
- For now, use only the Dev species metadata (e.g. mol. wt.)

gcpy/plot/compare_single_level.py
gcpy/plot/compare_zonal_mean.py
- Now obtain species metadata for Ref & Dev separately
- Now use molecular weights from the Ref and Dev species metadata
  when converting units to ug/m3 (via get_molwt_from_metadata function)

gcpy/util.py
- In routine "read_species_metadata", reeturn Ref and Dev species
  metadata separately instead of taking the union.  If only one file
  is passed, return the same metadata for Ref and Dev.
- Added routine "get_molwt_from_metadata"

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca changed the title Modify benchmark scripts to look for species_database.yml files in Ref and Dev rundirs, and to take the union of all species Modify benchmark scripts to look for species_database.yml files in Ref and Dev rundirs Jan 12, 2026
@yantosca yantosca force-pushed the bugfix/ref-and-dev-spcdb branch from 1197b16 to 2dfbf86 Compare January 12, 2026 21:31
gcpy/benchmark/cloud/template*.yml
gcpy/benchmark/config/*.yml
- Removed the species_metadata tags, as we assume that the species
  database name is always species_database.yml

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/util.py
- Bug fix: Function get_molwt_from_metadata now returns the
  Ref species metadata first, followed by Dev.  The order had been
  reversed inadvertently.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/plot/compare_single_level.py
gcpy/plot/compare_zonal_mean.py
- Edited the program logic when converting units to ug/m3 so that:
  - If both Ref and Dev metadata do not contain a molecular weight,
    print a warning message and skip to the next species.
  - If the Ref metadata contains a molecular weight but Dev does not,
    do the unit conversion for Ref, but set Dev to np.nan.   Also
    print a warning message.
  - If the Dev metadata contains a molecular weight but Ref does not,
    do the unit conversion for Dev but set Ref to np.nan.  Also
    print a warning message.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
…s code

gcpy/benchmark/modules/benchmark_funcs.py
- Call "compare_varnames" and construct the "cvars", "refonly", and
  "devonly" variables before adding missing variables (as NaN fields)
  to refdata and devdata.  This fixes a problem where all of the variables
  are considered common to both Ref & Dev even when they are not.

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/budget_ox.py
- Renamed "spcdb_file" -> "spcdb_files" and updated comments
- In routine "self.get_conv_factors", skip the spcdb (species
  metdata) from Ref and take the spcdb from Dev

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
gcpy/benchmark/modules/mass_cons_tables.py
- Fixed typo "species_datbase.yml" -> "species_database.yml"

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca merged commit 4371b0a into dev Jan 21, 2026
@yantosca yantosca deleted the bugfix/ref-and-dev-spcdb branch January 21, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Bug Fix Fixes a bug that was previously reported topic: Benchmark Plots and Tables Issues pertaining to generating plots/tables from benchmark output

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make_benchmark_aerosol_tables fails with new dust species in GEOS-Chem 14.7.0

2 participants